Local Concept Embeddings for Analysis of Concept Distributions in Vision DNN Feature Spaces
Abstract: Insights into the learned latent representations are imperative for verifying deep neural networks (DNNs) in critical computer vision (CV) tasks. Therefore, state-of-the-art supervised Concept-based eXplainable Artificial Intelligence (C-XAI) methods associate user-defined concepts like car'' each with a single vector in the DNN latent space (concept embedding vector). In the case of concept segmentation, these linearly separate between activation map pixels belonging to a concept and those belonging to background. Existing methods for concept segmentation, however, fall short of capturing implicitly learned sub-concepts (e.g., the DNN might split car intoproximate car'' and distant car''), and overlap of user-defined concepts (e.g., betweenbus'' and ``truck''). In other words, they do not capture the full distribution of concept representatives in latent space. For the first time, this work shows that these simplifications are frequently broken and that distribution information can be particularly useful for understanding DNN-learned notions of sub-concepts, concept confusion, and concept outliers. To allow exploration of learned concept distributions, we propose a novel local concept analysis framework. Instead of optimizing a single global concept vector on the complete dataset, it generates a local concept embedding (LoCE) vector for each individual sample. We use the distribution formed by LoCEs to explore the latent concept distribution by fitting Gaussian mixture models (GMMs), hierarchical clustering, and concept-level information retrieval and outlier detection. Despite its context sensitivity, our method's concept segmentation performance is competitive to global baselines. Analysis results are obtained on three datasets and six diverse vision DNN architectures, including vision transformers (ViTs).
- ISO/TC 22/SC 32. ISO/PAS 21448:2019: Road vehicles — Safety of the intended functionality, 2019.
- Sanity checks for saliency maps. Advances in neural information processing systems, 31, 2018.
- Debugging tests for model explanations. 33:700–712, 2020.
- Explainable artificial intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Information Fusion, 58:82–115, 2020.
- Network dissection: Quantifying interpretability of deep visual representations. In Proc. IEEE conf. computer vision and pattern recognition, pages 6541–6549, 2017.
- Benchmarking and survey of explanation methods for black box models. Data Mining and Knowledge Discovery, pages 1–60, 2023.
- A survey on the explainability of supervised machine learning. Journal of Artificial Intelligence Research, 70:245–317, 2021.
- This looks like that: Deep learning for interpretable image recognition. In Advances in Neural Information Processing Systems 32, pages 8928–8939, 2019.
- Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020.
- Interpretable stock anomaly detection based on spatio-temporal relation networks with genetic algorithm. IEEE Access, 9:68302–68319, 2021.
- Toward scalable and unified example-based explanation and outlier detection. IEEE Transactions on Image Processing, 31:525–540, 2021.
- European Commission. Proposal for a regulation of the european parliament and of the council laying down harmonised rules on artificial intelligence (Artificial Intelligence Act) and amending certain union legislative acts, 2021.
- Jonathan Crabbé and Mihaela van der Schaar. Concept Activation Regions: A Generalized Framework For Concept-Based Explanations. Advances in Neural Information Processing Systems, 35:2590–2607, 2022.
- A disentangling invertible interpretation network for explaining latent representations. In Proc. 2020 IEEE Conf. Comput. Vision and Pattern Recognition, pages 9220–9229. IEEE, 2020.
- Look at the variance! efficient black-box explanations with sobol-based sensitivity analysis. Advances in Neural Information Processing Systems, 34:26005–26014, 2021. Fel et al. [2023] Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, and Thomas Serre. Craft: Concept recursive activation factorization for explainability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2711–2721, 2023. Fong and Vedaldi [2018] Ruth Fong and Andrea Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018. Fuchs et al. [2018] Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Thomas Fel, Agustin Picard, Louis Bethune, Thibaut Boissin, David Vigouroux, Julien Colin, Rémi Cadène, and Thomas Serre. Craft: Concept recursive activation factorization for explainability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2711–2721, 2023. Fong and Vedaldi [2018] Ruth Fong and Andrea Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018. Fuchs et al. [2018] Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruth Fong and Andrea Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018. Fuchs et al. [2018] Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Craft: Concept recursive activation factorization for explainability. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2711–2721, 2023. Fong and Vedaldi [2018] Ruth Fong and Andrea Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018. Fuchs et al. [2018] Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruth Fong and Andrea Vedaldi. Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018. Fuchs et al. [2018] Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Net2vec: Quantifying and explaining how concepts are encoded by filters in deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 8730–8738, 2018. Fuchs et al. [2018] Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fabian B. Fuchs, Oliver Groth, Adam R. Kosiorek, Alex Bewley, Markus Wulfmeier, Andrea Vedaldi, and Ingmar Posner. Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Neural Stethoscopes: Unifying analytic, auxiliary and adversarial network probing. CoRR, abs/1806.05502, 2018. Ge et al. [2021] Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yunhao Ge, Yao Xiao, Zhi Xu, Meng Zheng, Srikrishna Karanam, Terrence Chen, Laurent Itti, and Ziyan Wu. A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- A peek into the reasoning of neural networks: Interpreting with structural visual concepts. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2195–2204, 2021. Ghorbani et al. [2019a] Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, pages 3681–3688, 2019a. Ghorbani et al. [2019b] Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Amirata Ghorbani, James Wexler, James Y Zou, and Been Kim. Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Towards automatic concept-based explanations. Advances in Neural Information Processing Systems, 32, 2019b. Goodman and Flaxman [2017] Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bryce Goodman and Seth Flaxman. European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- European union regulations on algorithmic decision-making and a “right to explanation”. AI Magazine, 38(3):50–57, 2017. Graziani et al. [2018] Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mara Graziani, Vincent Andrearczyk, and Henning Müller. Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Regression concept vectors for bidirectional explanations in histopathology. In Understanding and Interpreting Machine Learning in Medical Image Computing Applications, pages 124–132. Springer International Publishing, 2018. Guidotti et al. [2021] Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Riccardo Guidotti, Anna Monreale, Dino Pedreschi, and Fosca Giannotti. Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Principles of Explainable Artificial Intelligence. In Explainable AI Within the Digital Transformation and Cyber Physical Systems: XAI Methods and Applications, pages 9–31. Springer International Publishing, 2021. He et al. [2016] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Deep residual learning for image recognition. In Proc. IEEE onf. computer vision and pattern recognition, pages 770–778, 2016. Hoffmann et al. [2021] Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adrian Hoffmann, Claudio Fanconi, Rahul Rade, and Jonas Kohler. This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- This Looks Like That… Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks. arXiv:2105.02968 [cs], 2021. Hohman et al. [2020] Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Fred Hohman, Haekyu Park, Caleb Robinson, and Duen Horng Polo Chau. Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Summit: Scaling Deep Learning Interpretability by Visualizing Activation and Attribution Summarizations. IEEE Transactions on Visualization and Computer Graphics, 26(1):1096–1106, 2020. Houben et al. [2022] Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Houben, Stephanie Abrecht, Maram Akila, Andreas Bär, Felix Brockherde, Patrick Feifel, Tim Fingscheidt, Sujan Sai Gannamaneni, Seyed Eghbal Ghobadi, Ahmed Hammam, Anselm Haselhoff, Felix Hauser, Christian Heinzemann, Marco Hoffmann, Nikhil Kapoor, Falk Kappel, Marvin Klingner, Jan Kronenberger, Fabian Küppers, Jonas Löhdefink, Michael Mlynarski, Michael Mock, Firas Mualla, Svetlana Pavlitskaya, Maximilian Poretschkin, Alexander Pohl, Varun Ravi-Kumar, Julia Rosenzweig, Matthias Rottmann, Stefan Rüping, Timo Sämann, Jan David Schneider, Elena Schulz, Gesina Schwalbe, Joachim Sicking, Toshika Srivastava, Serin Varghese, Michael Weber, Sebastian Wirkert, Tim Wirtz, and Matthias Woehrle. Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Inspect, understand, overcome: A survey of practical methods for AI safety. In Deep Neural Networks and Data for Automated Driving: Robustness, Uncertainty Quantification, and Insights Towards Safety, pages 3–78. Springer International Publishing, 2022. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2020] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 24028:2020 Information Technology — Artificial Intelligence — Overview of Trustworthiness in Artificial Intelligence. ISO, 1 edition, 2020. ISO/IEC JTC 1/SC 42 Artificial Intelligence [2021] ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- ISO/IEC JTC 1/SC 42 Artificial Intelligence. ISO/IEC TR 5469:202x(E) - Artificial Intelligence — Functional Safety and AI Systems. ISO, 202x(e) (working draft) edition, 2021. ISO/TC 22/SC 32 [2022] ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- ISO/TC 22/SC 32. ISO/AWI PAS 8800(En): Road Vehicles — Safety and Artificial Intelligence. ISO, wd01 edition, 2022. Jocher [2020] Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Glenn Jocher. YOLOv5 in PyTorch, ONNX, CoreML, TFLite. https://github.com/ultralytics/yolov5, 2020. Kazhdan et al. [2021] Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Dmitry Kazhdan, Botty Dimanov, Helena Andres Terre, Mateja Jamnik, Pietro Liò, and Adrian Weller. Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR, abs/2104.06917, 2021. Kim et al. [2018] Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viegas, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav). In International conference on machine learning, pages 2668–2677. PMLR, 2018. Koh et al. [2020] Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Concept bottleneck models. In Int. conf. Machine Learning, pages 5338–5348. PMLR, 2020. Lapuschkin et al. [2019] Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Sebastian Lapuschkin, Stephan Wäldchen, Alexander Binder, Grégoire Montavon, Wojciech Samek, and Klaus-Robert Müller. Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Unmasking Clever Hans predictors and assessing what machines really learn. Nature Communications, 10(1):1096, 2019. Li et al. [2023] Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Zhong Li, Yuxuan Zhu, and Matthijs Van Leeuwen. A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- A survey on explainable anomaly detection. ACM Transactions on Knowledge Discovery from Data, 18(1):1–54, 2023. Liang et al. [2021] Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Yu Liang, Siguang Li, Chungang Yan, Maozhen Li, and Changjun Jiang. Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Explaining the black-box model: A survey of local interpretation methods for deep neural networks. Neurocomputing, 419:168–182, 2021. Lin et al. [2014] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Microsoft coco: Common objects in context. In European onf. computer vision, pages 740–755. Springer, 2014. Linardatos et al. [2021] Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Pantelis Linardatos, Vasilis Papastefanopoulos, and Sotiris Kotsiantis. Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Explainable ai: A review of machine learning interpretability methods. Entropy, 23(1):18, 2021. Liu et al. [2016] Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, and Alexander C Berg. Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Ssd: Single shot multibox detector. In Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pages 21–37. Springer, 2016. Longo et al. [2023] Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Luca Longo, Mario Brcic, Federico Cabitza, Jaesik Choi, Roberto Confalonieri, Javier Del Ser, Riccardo Guidotti, Yoichi Hayashi, Francisco Herrera, Andreas Holzinger, Richard Jiang, Hassan Khosravi, Freddy Lecue, Gianclaudio Malgieri, Andrés Páez, Wojciech Samek, Johannes Schneider, Timo Speith, and Simone Stumpf. Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Explainable artificial intelligence (xai) 2.0: A manifesto of open challenges and interdisciplinary research directions, 2023. Losch et al. [2019] Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Max Losch, Mario Fritz, and Bernt Schiele. Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Interpretability beyond classification output: Semantic bottleneck networks. In Proc. 3rd ACM Computer Science in Cars Symp. Extended Abstracts, 2019. Loshchilov and Hutter [2018] Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Decoupled Weight Decay Regularization. In International Conference on Learning Representations, 2018. Lucieri et al. [2020] Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Adriano Lucieri, Muhammad Naseer Bajwa, Andreas Dengel, and Sheraz Ahmed. Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Explaining AI-based decision support systems using concept localization maps. In Neural Information Processing, pages 185–193. Springer International Publishing, 2020. Lundberg and Lee [2017] Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- A unified approach to interpreting model predictions. Advances in neural information processing systems, 30, 2017. Marconato et al. [2022] Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Emanuele Marconato, Andrea Passerini, and Stefano Teso. GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- GlanceNets: Interpretable, Leak-proof Concept-based Models. In Advances in Neural Information Processing Systems, pages 21212–21227, 2022. Mikriukov et al. [2023a] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Evaluating the stability of semantic concept representations in cnns for robust explainability. arXiv preprint arXiv:2304.14864, 2023a. Mikriukov et al. [2023b] Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Georgii Mikriukov, Gesina Schwalbe, Christian Hellert, and Korinna Bade. Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Quantified semantic comparison of convolutional neural networks. arXiv preprint arXiv:2305.07663, 2023b. Müllner [2013] Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Daniel Müllner. fastcluster: Fast hierarchical, agglomerative clustering routines for r and python. Journal of Statistical Software, 53:1–18, 2013. Murdoch et al. [2019] W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. W. James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44):22071–22080, 2019. Novello et al. [2022] Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Paul Novello, Thomas Fel, and David Vigouroux. Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Making sense of dependence: Efficient black-box explanations using dependence measure. Advances in Neural Information Processing Systems, 35:4344–4357, 2022. Petsiuk et al. [2018] Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018. Posada-Moreno et al. [2022] Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andres Felipe Posada-Moreno, Nikita Surya, and Sebastian Trimpe. Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Eclad: Extracting concepts with local aggregated descriptors. arXiv preprint arXiv:2206.04531, 2022. Posada-Moreno et al. [2023] Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Andrés Felipe Posada-Moreno, Kai Müller, Florian Brillowski, Friedrich Solowjow, Thomas Gries, and Sebastian Trimpe. Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Scalable concept extraction in industry 4.0. arXiv preprint arXiv:2306.03551, 2023. Rabold et al. [2020] Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Johannes Rabold, Gesina Schwalbe, and Ute Schmid. Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Expressive explanations of DNNs by combining concept analysis with ILP. In KI 2020: Advances in Artificial Intelligence, pages 148–162. Springer International Publishing, 2020. Redmon and Farhadi [2018] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. Ribeiro et al. [2016a] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- ” why should i trust you?” explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1135–1144, 2016a. Ribeiro et al. [2016b] Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Marco Túlio Ribeiro, Sameer Singh, and Carlos Guestrin. ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- ”Why should I trust you?”: Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery and Data Mining, pages 1135–1144. ACM, 2016b. Rudin [2019] Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Cynthia Rudin. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence, 1(5):206–215, 2019. Saeki et al. [2019] Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mao Saeki, Jun Ogata, Masahiro Murakawa, and Tetsuji Ogawa. Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Visual explanation of neural network based rotation machinery anomaly detection system. In 2019 IEEE International Conference on Prognostics and Health Management (ICPHM), pages 1–4. IEEE, 2019. Schwalbe [2021] Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Gesina Schwalbe. Verification of size invariance in DNN activations using concept embeddings. In Artificial Intelligence Applications and Innovations, pages 374–386. Springer International Publishing, 2021. Schwalbe [2022] Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Gesina Schwalbe. Concept Embedding Analysis: A Review. arXiv:2203.13909 [cs, stat], 2022. Schwalbe and Finzel [2023] Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Gesina Schwalbe and Bettina Finzel. A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Mining and Knowledge Discovery, pages 1–59, 2023. Selvaraju et al. [2017] Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pages 618–626, 2017. Simonyan and Zisserman [2014] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014. Simonyan et al. [2013] Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv preprint arXiv:1312.6034, 2013. Smilkov et al. [2017] Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017. Springenberg et al. [2014] Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, and Martin Riedmiller. Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Striving for simplicity: The all convolutional net. arXiv preprint arXiv:1412.6806, 2014. Sundararajan et al. [2017] Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Axiomatic attribution for deep networks. In International conference on machine learning, pages 3319–3328. PMLR, 2017. Vilone and Longo [2020] Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Giulia Vilone and Luca Longo. Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Explainable artificial intelligence: A systematic review. CoRR, abs/2006.00093, 2020. Wang et al. [2021] Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jiaqi Wang, Huafeng Liu, Xinyue Wang, and Liping Jing. Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Interpretable image recognition by constructing transparent embedding space. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 895–904, 2021. Ward Jr [1963] Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Joe H Ward Jr. Hierarchical grouping to optimize an objective function. Journal of the American statistical association, 58(301):236–244, 1963. Yeh et al. [2020] Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Chih-Kuan Yeh, Been Kim, Sercan Arik, Chun-Liang Li, Tomas Pfister, and Pradeep Ravikumar. On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- On completeness-aware concept-based explanations in deep neural networks. In Advances in Neural Information Processing Systems 33, pages 20554–20565, 2020. Yuksekgonul et al. [2022] Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Mert Yuksekgonul, Maggie Wang, and James Zou. Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Post-hoc Concept Bottleneck Models. In ICLR 2022 Workshop on PAIR2Struct: Privacy, Accountability, Interpretability, Robustness, Reasoning on Structured Data, 2022. Zhang et al. [2018] Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Quanshi Zhang, Ruiming Cao, Feng Shi, Ying Nian Wu, and Song-Chun Zhu. Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Interpreting CNN knowledge via an explanatory graph. In Proc. 32nd AAAI Conf. Artificial Intelligence, pages 4454–4463. AAAI Press, 2018. Zhang et al. [2021] Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Ruihan Zhang, Prashan Madumal, Tim Miller, Krista A Ehinger, and Benjamin IP Rubinstein. Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Invertible concept-based explanations for cnn models with non-negative concept activation vectors. In Proceedings of the AAAI Conference on Artificial Intelligence, pages 11682–11690, 2021. Zhou et al. [2016] Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Bolei Zhou, Aditya Khosla, Àgata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Learning deep features for discriminative localization. In Proc. 2016 IEEE Conf. Comput. Vision and Pattern Recognition, pages 2921–2929. IEEE Computer Society, 2016. Zhou et al. [2021] Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021. Jianlong Zhou, Amir H. Gandomi, Fang Chen, and Andreas Holzinger. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
- Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5):593, 2021.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.